An open source software package for automated extraction of ITS1 and ITS2 from fungal ITS sequences for use in high-throughput community assays and molecular ecology
نویسندگان
چکیده
We introduce an open source software utility to extract the highly variable ITS1 and ITS2 subregions from fungal nuclear ITS sequences, the region of choice for environmental sampling and molecular identification of fungi. Inclusion of parts of the neighbouring, very conserved, ribosomal genes in the sequence identification process regularly leads to distorted results. The utility is available for UNIX-type operating systems, including MacOS X, and processes about 1 000 sequences per minute. a 2010 Elsevier Ltd and The British Mycological Society. All rights reserved. * Corresponding author. Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden. Tel.: þ46 31 786 2623; fax: þ46 31 786 2560. E-mail address: [email protected] (R.H. Nilsson). ava i lab le a t www.sc iencedi rec t .com journa l homepage : www.e lsev ie r . com/ loca te / funeco 1754-5048/$ e see front matter a 2010 Elsevier Ltd and The British Mycological Society. All rights reserved. doi:10.1016/j.funeco.2010.05.002 f u n g a l e c o l o g y 3 ( 2 0 1 0 ) 2 8 4e2 8 7 Author's personal copy Fungi form a ubiquitous group of heterotrophic organisms with a largely subterranean or otherwise inconspicuous life cycle (Blackwell et al. 2006). The poor correspondence between aboveground fruit bodies and the diversity of the fungal community below ground (or in the substrate) has precluded a detailed understanding of the species composition of fungal communities, although the access to DNA sequence data is starting to change this (Peay et al. 2008; Hibbett et al. 2009). The most commonly sequenced genetic marker for molecular identification of fungi from environmental samples is the internal transcribed spacer (ITS ) region of the nuclear ribosomal repeat unit (Ryberg et al. 2009; Abarenkov et al. 2010). It is cumbersome to attain the sequence depth needed for reasonably accurate views of the underlying community, but emerging sequencing technologies such as massively parallel (“454”) pyrosequencing (Margulies et al. 2005) address this and shift the focus from sequence depth to sequence processing and quality control (Huse et al. 2007; Shendure & Ji 2008; Galand et al. 2009; Kunin et al. 2010). Massively parallel pyrosequencing generates reads shorter than those obtained by traditional Sanger sequencing. As a result, sequencing the ITS region in full (450e650þ bp.) is presently impossible. The ITS offers three potential target subregions for which numerous PCR primers are readily available: ITS1, 5.8S, and ITS2 (Fig 1). The ITS1 is highly variable and about 180-base pairs (bp.) in length. The ITS2 is nearly as variable although slightly shorter (w170 bp.); the lengths of ITS1 and ITS2 do, however, vary substantially among taxa (Nilsson et al. 2008). The intercalary 5.8S gene (w160 bp.) is very conserved and can be aligned across the fungal phyla. The flanking ribosomal genes nuclear small subunit (nSSU/18S ) upstream of ITS1 and nuclear large subunit (nLSU/28S ) downstream of ITS2 make good primer anchors (for ITS1 and ITS2, respectively), with the intercalary 5.8S serving as the second anchor region (cf. Bueé et al. 2009; Jumpponen & Jones 2009). Depending on how far into these genes the primer sites are, however, the residual portions of the genes left in the ITS sequence may skew sequence similarity searches involving, e.g., BLAST (Altschul et al. 1997). These extra sequence segments, many times more conserved than ITS1 and ITS2, will alwaysfindmatches in the sequencedatabasese evenwhen ITS1 and ITS2 do not e and so will invariably add to the length of the BLAST alignment. This makes automated interpretation of the BLAST results problematic and regularly has the effect that a different sequence or even species is presented as the bestmatch than if ITS1 or ITS2 alone had been analyzed (Bruns & Shefferson 2004). This was observed for 11 %of the 86000 ITS-basedBLAST searches studied byNilsson et al. (2009). Sequence clustering into hypothetically conspecific taxonomic units may similarly be distorted by these segments. Though conserved, the nSSU and nLSU are variable enough that they cannot be located and deleted using regular expressions or pattern matching for a wide selection of fungi. They can be removed manually given a multiple alignment and a primer chart or an annotated reference alignment such as the one provided by Hibbett et al. (1995), but this becomes unfeasible as datasets grow. The present study introduces a software utility to extract the ITS1 and ITS2 from large fungal ITS datasets. The software accounts for partial sequence datae such as when only nSSU and half of ITS1 are available e as well as input sequences in the reverse complementary direction. It is available at http://www.emerencia.org/FungalITSextractor.html (Supplementary material 1) for UNIXbased operating systems, including MacOS X. The software is written in Perl and processes FASTA format (Pearson & Lipman 1988) input sequences sequentially. ITS1 and ITS2 are located using long (30e50 bp.) and short (18e25 bp.) Hidden Markov models (HMMs) computed in the HMMER package (v. 2; Eddy 1998) from inclusive alignments of the nSSU (30 region; Tehler et al. 2003), 5.8S (50 and 30 regions; Nilsson et al. 2008), and nLSU (50 region; James et al. 2006). The query sequence is compared to the long HMMs for each of nSSU, 5.8S (50 and 30), and nLSU using the HMMER package. If the boundaries of these genes can be detected, ITS1 and ITS2 are extracted from the sequence based on those positions. If not, an attempt is made to locate the genes using the shorter HMMs to account for sequences with shorter included conserved regions. Partial extractions are performed if one or more, but not all, genes are detected; if, for instance, only the 50 end of 5.8S is found, the ITS1 is extracted as the region upstream of 5.8S. A set of FASTA files comprises the core output of the software; these include all ITS1 and ITS2 sequences extracted from the input sequences and all sequences for which neither subregion could be found. The program outputs detailed information to the screen, including a summary of the extraction process for each input sequence and the absolute position of each of the subregions in the sequence. The feature to highlight sequences for which none of the flanking regions could be found is of particular relevance to massively parallel pyrosequencing, where artificial sequences consisting entirely of noise are sometimes produced and may pass filters based solely on quality scores (Quince et al. 2009). Furthermore, provided that the query sequences feature the 50 region of 5.8S, reverse complementary sequences are indicated as such and given in the correct direction. The software performs better the more of nSSU/5.8S/nLSU are available (up to about 50 bp.). We found that the use of HMMs down to ca. 18 bp. in size (matching as little as 18 bp. of the distal part of any of the genes) still provide satisfactory Fig 1 e Overview of the fungal ITS region. The spacer ITS1 is found between the 30 end of the nSSU (18S ) gene and the 50 end of the 5.8S gene, and the ITS2 is found between the 30 end of 5.8S and the 50 end of the nLSU (28S ) gene. The long bars above the genes indicate the location of the long HMM for that particular gene, and the short bars indicate the location of the short HMMs. All HMMs are positioned in such a way as to cover the very end and the very beginning of the respective genes. The individual lengths of the HMMs were adjusted to reflect the position of the most commonly used primers. ITS1 and ITS2 sequence extraction software 285 Author's personal copy matcheswith few false positives. Any HMMs shorter than that are difficult to construct if they still are to be used for a wide selection of fungi. Sequences of poor read quality may pose a problem to the software insofar as the region to which the HMM is compared is obfuscated by incorrect or ambiguous nucleotides. The software may also perform suboptimally on taxa with very deviant rRNA genes e notably the genera Cantharellus and Tulasnella as well as some basal lineages such as the Microsporidia (Feibelman et al. 1994; James et al. 2006; Moncalvo et al. 2006; Taylor & McCormick 2008) e and taxa with large insertions or deletions in the regions targeted by the HMMs (cf. Shinohara et al. 1996; Bhattacharya et al. 2000; Holst-Jensen et al. 2004). To seek to modify general fungal HMMs to also include these deviant lineagesmay detract from the usefulness of the HMMs for the non-deviant lineages; instead, such taxa should be addressed using tailored HMMs. We evaluated the software on 1500 ITS sequences from all fungal phyla in GenBank (Benson et al. 2008) and from the Quercus phyllosphere pyrosequencing data of Jumpponen & Jones (2009) (Supplementary material 2). All sequences were compared with Hibbett et al. (1995) to identify the subregions, and the results were juxtaposed with those obtained from the software. The respective subregions were identified and extracted successfully for 1462 (97.5 %) of the sequences. The 38 cases where the extraction of either subregion failed were explained by poor sequence data (17 instances), the failure of theHMMsto identify the regionsdue to thedeviantnatureof the taxon under scrutiny (11 instances), and false negatives (6 instances). In addition 4 (0.3 %) false positives (incorrect extractions) were observed. The user may be able to enhance theperformanceof the software furtherby tailoring theHMMER E-values (Eddy 1998) to suit any specific property of the target sequences, such as taxonomic affiliation. The very nature of environmental sequencing does, however speak against static assumptions about which taxa are present in samples at hand.
منابع مشابه
ITS1 versus ITS2 as DNA metabarcodes for fungi.
The nuclear ribosomal Internal Transcribed Spacer ITS region is widely used as a DNA metabarcoding marker to characterize the diversity and composition of fungal communities. In amplicon pyrosequencing studies of fungal diversity, one of the spacers ITS1 or ITS2 of the ITS region is normally used. In this methodological study we evaluate the usability of ITS1 vs. ITS2 as a DNA metabarcoding mar...
متن کاملشناسایی مولکولی گونههای فاسیولا در استان آذربایجان غربی
Background & Aims: Fascioliasis caused by Fasciola hepatica and Fasciola gigantica has medical and economic importance in the world. Traditional approaches are not accurate and reliable in identification of agent parasites. Thus the present study was designed to identify the Fasciola sppby molecular methods in West Azerbaijan province.Materials & Methods: In current study Fasciola isolates wer...
متن کاملIsolation and identification of Eurotium species from contaminated rice by morphology and DNA sequencing
30 milled rice samples were collected from retailers in four states of Malaysia. These samples were evaluated for Eurotium spp. contaminations by direct plating on malt extract salt agar (MESA). All Eurotium were isolated and identified based on morphology and nucleotide sequences of internal transcribed spacer 1 (ITS1) and ITS2 of the rDNA. Four Eurotium species (E. rubrum, E. amstelodami, E....
متن کاملStudy on Phylogenetic Relationship among some of Iranian Wild Almond Species using Sequences of ITS1-5.8S rDNA-ITS2 Region and Chloroplastic trnL
Phylogenetic relations among 12 wild species of almonds, one cultivated almond and one species of peach were investigated by using of ITS1-5.8S rDNA-ITS2 sequences and trnL region of chloroplast DNA. To do this, maximum-parsimony and neighbor joining analysis adopted. Results of ITS data showed that studied species of Prunus only divided in two groups but incapable to separate different section...
متن کاملSpecific PCR-primers for detection of Picoa lefebvrei desert truffle in Carex stenophylla roots
Picoa species are hypogeous desert truffles, which can be found in semi-arid ranges of North Africa, West Asian, South of Europe and Middle East, including Iran (Moreno et al. 2000, Jamali & Banihashemi 2012, 2013). Picoa species form symbiosis mainly on roots of annual and perennial herbaceous plants of the Helianthemum, including H. ledifolium and H. salicifolium var. salicifolium (Jamali & B...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010